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Abstract — The estimation of a random vector x with indepen- 
dent components passed through a linear transform followed by a 
componentwise (possibly nonlinear) output map arises in a range 
of applications. Approximate message passing (AMP) methods, 
based on Gaussian approximations of loopy belief propagation, 
have recently attracted considerable attention for such prob- 
lems. For large random transforms, these methods exhibit fast 
convergence and admit precise analytic characterizations with 
testable conditions for optimality, even for certain non-convex 
problem instances. However, the behavior of AMP under general 
transforms is not fully understood. In this paper, we consider 
the generalized AMP (GAMP) algorithm and relate the method 
to more common optimization techniques. This analysis enables 
a precise characterization of the GAMP algorithm fixed-points 
that applies to arbitrary transforms. In particular, we show that 
the fixed points of the so-called max-sum GAMP algorithm for 
MAP estimation are critical points of a constrained maximization 
of the posterior density. The fixed-points of the sum-product 
GAMP algorithm for estimation of the posterior marginals can 
be interpreted as critical points of a certain mean-field variational 
optimization. 

Index Terms — Belief propagation, ADMM, variational opti- 
mization, message passing. 



I. Introduction 
Consider the constrained optimization problem 

(x,z) := argmin_F(x, z) s.t. z = Ax, (1) 

x.z 

where x e R", z 6 R m , A 6 M mx " and the objective function 
admits a decomposition of the form 

F(x,z) :=/ B (x) + /,(z) 

n m 

/x(x) Y^L (.»•,). /,(*) =2 /«(*)■ (2) 

j=l i=l 

for scalar functions f x { ) and f Zi (•). One example where this 
optimization arises is the estimation problem in Fig. Q] Here, a 
random vector x has independent components with densities 
PxAxj), and passes through a linear transform to yield an 
output z = Ax. The problem is to estimate x and z from 
measurements y generated by a componentwise conditional 
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Fig. 1. System model: The GAMP method considered here can be used for 
approximate MAP and MMSE estimation of x from y. 



density p y . \ Zi {yi\zi)- Under this observation model, the vectors 
x and z will have a posterior joint density given by 

Px,z|y(x,z|y) = [^(y)]- 1 e- F ( x ' z )l {z=Ax} , (3) 

where F(x, z) is given by (f2|l when the scalar functions are 
set to the negative log densities: 

Note that in (01, F(x, z) is implicitly a function of y, Z(y) 
is a normalization constant and the term 1{ z= ax} imposes 
the linear constraint that z = Ax. The optimization (Q~|i in 
this case produces the maximum a posteriori (MAP) estimate 
of x and z. In statistics, the system in Fig. Q] is sometimes 
referred to as a generalized linear model |fl], [0 and is 
used in a range of applications including regression, inverse 
problems, and filtering. Bayesian forms of compressed sensing 
can also be considered in this framework by imposing a sparse 
prior for the components Xj 0. In all these applications, 
one may instead be interested in estimating the posterior 
marginals p(xj\y) and p(zi\y). We relate this objective to an 
optimization of the form ([l}-© in the sequel. 

Most current numerical methods for solving the constrained 
optimization problem ((TJ attempt to exploit the separable 
structure of the objective function (|2) either through gener- 
alizations of the iterative shrinkage and thresholding (ISTA) 
algorithms |4|-|10| or alternating direction method of mul- 
tipliers (ADMM) approach [11|-[20|. There are now a large 
number of these methods and we provide a brief review in 
Section HI] 

However, in recent years, there has been considerable 
interest in so-called approximate message passing (AMP) 
methods based on Gaussian and quadratic approximations of 
loopy belief propagation in graphical models [21 |-[26|. The 
main appealing feature of the AMP algorithms is that for 
certain large random matrices A, the asymptotic behavior of 
the algorithm can be rigorously and exactly predicted with 
testable conditions for optimality, even for many non-convex 
instances. Moreover, in the case of these large, random matri- 
ces, simulations appear to show very fast convergence of AMP 
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methods when compared against state-of-the-art conventional 
optimization techniques. 

However, despite recent extensions to larger classes of 
random matrices 11271 . 11281 . the behavior of AMP methods 
under general A is not fully understood. The broad purpose 
of this paper is to show that certain forms of AMP algorithms 
can be seen as variants of more conventional optimization 
methods. This analysis will enable a precise characterization of 
the fixed points of the AMP methods that applies to arbitrary 
A. 

Our study focuses on a generalized AMP (GAMP) method 
proposed in [26] and rigorously analyzed in [29|. We consider 
this algorithm since many other variants of AMP are special 
cases of this general procedure. The GAMP method has two 
common versions: max-sum GAMP for the MAP estimation of 
the vectors x and z for the problem in Fig. [T] and sum-product 
GAMP for approximate inference of the posterior marginals. 

For both versions of GAMP, the algorithms produce esti- 
mates x and z along with certain quadratic terms. Our first 
main result (Theorem Q3 shows that the fixed points (x,z) 
of the max-sum GAMP are critical points of the optimiza- 
tion ((TJ. In addition, the quadratic terms can be considered 
as approximations of the diagonal Hessian of the objective 
function. For sum-product GAMP, we provide a variational 
interpretation of the algorithm's fixed points. Specifically, we 
show (Theorem [2]i that the algorithm fixed points can be 
interpreted as means and variances of a certain Gaussian mean- 
field approximation of the posterior distribution. The results 
are compared against the well-known characterization of fixed 
points of standard loopy BP |30|-|33|. 

II. Review of GAMP and Related Optimization 
Methods 

A. Generalized Approximate Message Passing 

Graphical model methods ||34l are a natural approach to 
the optimization problem (Q]i given the separable structure 
of the objective function (|2j. However, traditional graphical 
model techniques such as loopy belief propagation (loopy 
BP) generally require that the constraint matrix A is sparse. 
Approximate message passing (AMP) refer to a class of 
Gaussian and quadratic approximations of loopy BP that can 
be applied to dense A. AMP approximations of loopy BP 
originated in CDMA multiuser detection problems 135 l- ll3"7l 
and have received considerable recent attention in the context 
of compressed sensing [21|-|26|, |38|. The Gaussian approx- 
imations used in AMP are also closely related to expectation 
propagation techniques [39|, |40|. 

In this work, we study the so-called generalized AMP 
(GAMP) algorithm |26| and rigorously analyzed in [29|. The 
procedure is shown in Algorithm Q] and produces a sequences 
of estimates x* and z* along with what we will call quadratic 
terms r^,T*, .... We focus on two variants of the GAMP 
algorithm: max-sum GAMP and sum-product GAMP. 

In the max-sum version of the algorithm, the outputs (x*, z') 
represent a sequence of estimates of the solution to the 
optimization problem (Q~|i, or equivalently the MAP estimates 
for the posterior (|3). Each iteration of the algorithm involves 



Algorithm 1 Generalized Approximate Message Passing 
(GAMP) 

Require: Matrix A, functions f x (x), f z (z). and algorithm 
choice MaxSum or SumProduct. 

1: t <r- 

2: Initialize x*, t* 

3: s*- 1 <- 

4: S |A| 2 (componentwise magnitude squared) 

5: repeat 

6: {Output node update} 

7: r* <- 

8: p* «— Ax* — S* Vp 

9: if MaxSum then 

10: z* arg min z F z (z, p 4 , t*) 

11: T 2 *^ 7^/(1 + 7^ d 2 / Z (z 4 )/^ 2 ) 

12: else if SumProduct then 

13: Z* «-E(z|p',<r£) 

14: T l z <r- Vai^zlp 4 ,^) 

15: end if 

16: S* «- (z* - p*)/T^ 

17: T*<-l/r*-T*/(7# a 

18: 

19: {Input node update} 

20: r* <- l/(S T Ti) 

21: r* 4r- X* + T*A T S* 

22: if MaxSum then 

23: x t+1 <— arg min x F x (x, r* , r* ) 

24: T* +1 <- T*/(l + T^d 2 f x (^)/dx 2 ) 

25: else if SumProduct then 

26: x t+1 <-E(x|r*,T r *) 

27: r^ +1 <— var(x|r*, t*) 

28: end if 

29: until Terminated 



computing minimizations shown in lines [To] and [23] whose 
objective functions are given by 

F B (x,r,T r ) := / x (x) + i||x-r|| 2 i ., (4a) 

F z (z,j>,t p ) := ^( z ) + i||z-p|| 2 p , (4b) 

where we use the notation that, for any vectors v and r 6 W 
where r has positive components: 

r I?). I 2 
l|v||^:=E— ■ 

Since the objective function has a separable form (|2), it can 
be verified that the minimizations in lines [10] and [23] can be 
performed componentwise: 

z \ = prox t t (pi), = prox Tt , (r,-), (5) 

where prox^(-) is the so-called proximal operator: 

proxi(w) := argmin/(w) + -\u — v\ 2 . (6) 

In this way, the max-sum GAMP algorithm reduces the vector- 
valued optimization (Q]i to a sequence of scalar optimizations. 
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Similarly, it can be verified that the quadratic terms in lines 
[TT1 and [24] depend on the derivative of the proximal operator: 



r t+1 = r* 



(7) 



For the sum-product GAMP algorithm, the outputs (x*,z*) 
represent the posterior means for the density (|3}, or equiva- 
lently, the minimum mean-squared error (MMSE) estimates 
of (x, z) given y. As discussed in ||26l . the algorithm also 
provides estimates of the posterior marginals. The expectations 
and variances in lines Qj] [14] [26] and [27] of the algorihtm are 
to be taken with respect to the probability density functions: 



p(x|r,T r ) 
p(z|p,T P ) 



(X 
(X 



exp[-F x (x,T,T r )] (8a) 
cxp[-F 2 (z,p,T p )], (8b) 



where F x (-) and F z (-) are defined in Under the separa- 
bility assumption (fJJ, these density functions factor as 



p(x\r,T r ) 



P(z|p,T p ) 



n 



exp 



J|cxp 



~fxj i x j) 



-fM 



2r r 



\zj -ft | S 
2t„.. 



(9a) 



(9b) 



Thus, the expectations and variance computations can be com- 
puted componentwise and the sum-product GAMP algorithm 
reduces the vector-valued estimation problem to a sequence of 
scalar estimation problems. 

B. Iterative Shrinkage and Thresholding Algorithm 

The goal in the paper is to relate the GAMP method 
to more conventional optimization techniques. One of the 
more common of such approaches is a generalization of the 
Iterative Shrinkage and Thresholding Algorithm (ISTA) shown 
in Algorithm [2] g)-®. 

Algorithm 2 Iterative Shrinkage and Thresholding Algorithm 

(ISTA) 

Require: Matrix A, scalar c, functions f x ( ), /*(•)• 



t <r- 

Initialize x*. 
repeat 

z* <— Ax* 

q* ^%(z*)/dz 

x t+1 <- argmin x f x (x) + (q') T Ax+ (c/2)||x - 
until Terminated 



_t II 2 



The algorithm is built on the idea that, at each iteration t, 
the second cost term in the minimization arg min x f x (x) + 
f z (Ax) specified by (Q} is replaced by a quadratic majorizing 
cost g 2 (x) > / 2 (Ax) that coincides at the point x = x* (i.e., 
<?z(x*) = / 2 (Ax*)), where majorization can be achieved via 
appropriate choice of c > 0. This approach is motivated by the 
fact that, if / x (x) and / 2 (z) are both separable, as in (fJJ, then 
both the gradient in line [5] and minimization in line [6] can be 
performed componentwise. Moreover, when f x {x) = A||x[|i, 
as in the LASSO problem [41 j, the minimization in line [6] 
can be computed directly via a shrinkage and thresholding 



operation - hence the name of the algorithm. The convergence 
of the ISTA method tends to be slow, but a number of enhanced 
methods have been successful and widely-used [7|-[10|. 

C. Alternating Direction Method of Multipliers 

A second common class of methods is built around the 
Alternating Direction Method of Multipliers (ADMM) ifTTI 
approach shown in Algorithm [3] The Lagrangian for the 
optimization problem (HJ is given by 



L(x, z, s) := F(x, z) + s T (z — Ax), 



(10) 



where s are the dual parameters. ADMM attempts to produce 
a sequence of estimates (x*,z*,s*) that converge to a saddle 
point of the Lagrangian (TTOb . The parameters of the algorithm 
are a step-size a > and the penalty terms Q z (-) and Q x {-), 
which classical ADMM would choose as 



g x (x,x t ,z*,a) = -||z*-Ax|| 
g 2 (z,z*,x t+1 ,a) = -||z-Ax 



t+l||2 



(11a) 
(lib) 



Algorithm 3 Alternating Direction Method of Multipliers 
(ADMM) 

Require: A, a, functions f x (-), / 2 (-), Q x (-), Q z {-) 



t <- 

Initialize x*, z', s* 
repeat 

x t+1 argmin x L(x, z', s') 
z t+1 <— arg min z L(x t+1 , z, s : 



s t+1 <— s* + a(z 
until Terminated 



t+i 



Ax t+1 ) 



3 K (x,x t ,z t ,a) 



When the objective function admits a separable form (fJJ 
and one uses the auxiliary function Q z ( ) in ( 111 bb . the z- 
minimization in line [5] separates into m scalar optimizations. 
However, due to the quadratic term ||Ax|| 2 in ( 1 1 1 al l, the x- 
minimization in line |4] does not separate for general A. To 
circumvent this problem, one might consider a separable in- 
exact x-minimization, since many inexact variants of ADMM 
are known to converge |[T2l . For example, Q x {-) might be 
chosen to yield separability while majorizing the original cost 
in line [4] as was done for ISTA's line [6] i.e., 



Q x (x,x t ,z*,a) 



(12) 



= |||z* - Ax|| 2 + i(x - xT(d - «A T A)(x - x*) 
with appropriate c, after which ADMM's line[4]would become 



argmin/ 3; (x) 



*-A T Ax* 



(13) 



Many other choices of penalty Q x {-) have also been consid- 
ered in the literature (see, e.g., the overview in [18|), yielding 
connections to Douglas Rachford splitting [12], split Bregman 
|[L3l . split inexact Uzawa |[T4ll . proximal forward-backward 
splitting iflSl . and various primal-dual algorithms ifTBI - EOl . 
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Other variants of ADMM are also possible IfTTI . For ex- 
ample, the step-size a might vary with the iteration t, or the 
penalty terms might have the form (z — Ax) T P(z — Ax) for 
positive semidefinite P. As we will see, these generalizations 
provide a connection to GAMP. 

III. Fixed-Points of Max-Sum GAMP 

Our first result connects the max-sum GAMP algorithm to 
inexact ADMM. Given points (x, z), define the matrices 

Qx := (diag(d ;c )+A T diag(d z )A) * (14a) 

Q z := (diag(d z )- 1 +Adiag(d ;c )- 1 A T ) (14b) 

where d^ and d z are the componentwise second derivatives 

d 2 /,(x) d*f z (z) 
a x := — — — , d 2 := — — — . (15) 



3x 2 ' 2 ' dz 2 ' 

Note that when f x (x.) and f z (z) are strictly convex, the 
vectors d^ and d z are positive. Observe that the matrix Q x in 
( fl4l is the inverse Hessian of the objective function F(x, z) 
constrained to z = Ax. That is, 



Q, 



dx 



•F(x,Ax) 



Theorem 1: The outputs of the max-sum GAMP version of 
Algorithm [T] satisfy the recursions 



j+i _ 



arg min L(x, z* , s*) + \ ||x - x* \\ 2 Tt 



1 



argminL(x + , z, s ) + -||z - Ax 



t+l II 2 



-t+l 



-t+l 



t+l 



Ax t+i ) 



(16a) 
m (16b) 
(16c) 



where L(x, z, s) is the Lagrangian defined in ( fTob . 

Now suppose that x, z, s, p, T r , . . . are fixed points of the 
algorithm (where the "hats" on x and z are used to distinguish 
them from dummy variables). Then, the fixed points are critical 
points of the constrained optimization ([TJ in that z = Ax and 

d d 

— L(x,z,s) = 0, — L(x,z,s)=0. (17) 
ox az 

Also, the quadratic terms t x , t s are the approximate diagonals 
(see Appendix [A) to the inverse Hessian Q x and the matrix 
Q z in (T4]l at (x,z) = (x,z). 

Proof: See Appendix [B] ■ 
The first part of the Theorem, equation ([16) , shows that 
max-sum GAMP can be interpreted as the ADMM Algo- 
rithm [3] with adaptive vector- valued step-sizes r* and and a 
particular choice of penalty Q x {-). To more precisely connect 
GAMP and existing algorithms, it helps to express GAMP's 
x-update ( |16a| > as the 9 = case of 



arg min/ :E (x) 



T*A r (0(s*- 1 -s t )-s t ) 



(18) 

and recognize that the ISTA-inspired inexact ADMM x-update 
(D~3T i coincides with the 9 = 1 case under step-sizes a = 1/t* 
and c = 1/t*. The convergence of this algorithm for particular 



6 G [0,1] was studied in [18|-[20| under convex functions 
fx( m ) an d /z(0 an d fixed scalar step-sizes. Unfortunately, 
these convergence results do not directly apply to the adaptive 
vector-valued step-sizes of GAMP. However, the second part 
of the Theorem shows at least that, if the algorithm converges, 
its fixed points will be critical points of the constrained 
optimization (HJ. In addition, the quadratic term t x can be 
interpreted as an approximate diagonal to the inverse Hessian. 

Finally, it is useful to compare the fixed-points of GAMP 
with those of standard BP. A classic result of ||30) shows that 
any fixed point for standard max-sum loopy BP is locally 
optimal in the sense that one cannot improve the objective 
function by perturbing the solution on any set of components 
whose variables belong to a subgraph that contains at most 
one cycle. In particular, if the overall graph is acyclic, any 
fixed-point of standard max-sum loopy BP is globally optimal. 
Also, for any graph, the objective function cannot be reduced 
by changing any individual component. The local optimality 
for GAMP provided by Theorem Q] is weaker in that the 
fixed-points only satisfy first-order conditions for saddle points 
of the Lagrangian. This implies that, even an individual 
component may only be locally optimal. 

IV. Fixed-Points of Sum-Product GAMP 
A. Variational Formulation 

The characterization of the fixed points of the sum-product 
GAMP algorithm is somewhat more complicated to describe, 
and requires a certain variational interpretation - a common 
framework for understanding sum-product loopy BP [32 1, [34 1. 
For any fixed observation y, the density function p x ,z|y ("i "|y) 
in (f3]l must minimize 



Px,z|y 



arg min_D(&||p X! 

b 



(19) 



where the minimization is over all density functions b(x, z) on 
the set z = Ax and D(b\ \p XtZ \ y ) is the Kullback-Leibler (KL) 
divergence. Now, let p x | y (x|y) and p z | y (z|y) be the marginal 
densities for the posterior p x ,z|y Using the separable nature 
of ^(x, z) in (|2j, it can be verified that the minimization (fT9l 
can be rewritten as 

(Px|y,Pz| y ) = arg min J K l (K, b z ) s.t. b z = T A b x , (20) 

where the minimization is over density functions b x (x) and 
b z (z) and JKL{b x ,b z ) is the functional 

JklOzA) := D(b x \\e~f*) + D(b z \\e- f *) + H(b z ). (21) 

In j20i . we have used the notation b z = Tj^bx to indicate 
that b z (z) is the density for a random vector z = Ax with 
x ~ b x (x). Thus, p z | y = TAP x |y. Note that we are treating 
A as deterministic. 

Our next main result, Theorem [2] below, will show that 
the fixed points of the sum-product GAMP algorithm can be 
interpreted as critical points of the optimization (l20l . but with 
three key approximations: First, similar to what is known as a 
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mean-field approximation, the optimization is performed only 
over factorizable density functions of the form 

n m 

M*0 = II M^)> b z (z) = Y[b Zi {zi). (22) 
j=i i=i 

Secondly, the objective function in ( f2TT > is replaced by 

J S p{b x ,b z ,T p ) := D(6 x ||e- / -) + D(&*||e~ /a ) 

+#gauBs(fy»Tp) (23) 

where x p is a positive vector and H ga .uss(b z ,T p ) is the 
Gaussian upper bound on the entropy 



— !— var(z,|6 Zl ) + i log(27rr pi ) 



(24) 

The third and final approximation is that the constraint that 
b z = T&b x is replaced by weaker moment matching con- 
straints E(z\b z ) = AE(x|6 x ) and t'p = Svar(x|6 x ). The 
resulting optimization is 

(b x ,b z ,T p ) = &rgminJ SP (b x ,b z ,T p ) (25a) 
s.t. E(z|6 z ) = AE(x|6 x ), r p = Svar(x|6 x ). (25b) 

Note that in (125bl i. the variance var(x|6 x ) is to be inter- 
preted as a vector (not a covariance matrix) with components 
vax(xj\b x ). The next lemma provides a certain Gaussian 
interpretation to this approximate optimization (|231 . 

Lemma 1: For any positive vector r p , and density functions 
b x and b z , Jsp(b x , b z ,T p ) is an upper bound: 



Jsp{b x ,b z ,T p ) > J K h{b x ,b z ), 



(26) 



with equality in the case when b z is separable and Gaussian 
and t v = var(z|6 z ). 

Proof: See Appendix Icl ■ 

Thus, the optimization (|25] | can be interpreted as an approx- 
imation where the distributions are factorizable and the output 
distribution b z is assumed to be Gaussian. We will thus call 
the optimization ( |25t the Gaussian approximate optimization. 
This Gaussian approximate optimization is consistent with the 
manner in which the sum-product GAMP algorithm is derived: 
In standard loopy belief propagation, the sum-product updates 
assume independent, and thus factorizable, distributions at 
the input and output nodes. Moreover, the GAMP variant 
of algorithm additionally applies a Central Limit Theorem 
approximation to justify that the output distributions are ap- 
proximately Gaussian. 

It is important to realize that the optimization (EBT l is neither 
necessarily an upper nor lower bound to (l20V Although the 
objective function satisfies the upper bound ( 1231 , the moment 
matching constraints ( |25bt are weaker than b z = Tpjj x . 



B. Equivalent Optimization 

Corresponding to the variational objective function 



define 



where the terms on the right-hand side are the constrained 
optimizations 



if P (x,T x ) :=min D(b x \ \e f ") 

s.t. E(x|6 x ) = x. var(x|6 x ) = r ; 



F^ P (z 1 T p ) := minD(6 z ||e 
s.t. E(z\b z ) = z. 



(28a) 
(28b) 



Lemma 2: Suppose that (x,z, t x ,t p ) are solutions to the 
optimization 

(%z,t x ,t p ) =argminFsp(x,z,r I ,T p ) (29a) 

s.t. z = Ax, t p = St x . (29b) 

Then the densities (b x , b z ) given by the solutions to the opti- 
mizations in (l28l are minima of the variational optimization 



Conversely, given any solution (b x ,b z ) to the variational 
optimization ((25), the vectors 

x = E(x|6 x ), r x = var(x|& x ), (30a) 
z = E(z\b z ), t p = St x , (30b) 

are solutions to the optimization (|29l . 

Proof: See Appendix iDl ■ 
The lemma shows that the variational optimization (|29l 
over densities is equivalent to a vector-valued optimization 
So, we need only consider the vector-valued optimization 
Corresponding to this constrained optimization, define 
the Lagrangian 

L sp (%z,t x ,t p ,s) = F SP (x,z,^T p ) + s T (z-Ax), (31) 

where s represents a vector of dual parameters. We can now 
state the main result. 

Theorem 2: Consider the outputs of the sum-product 
GAMP version of Algorithm Q] Then, the updates for x* and 
t* are equivalent to 



(x t+i , r* +L ) = arg min L SP (x, z* , t x , r* , s* ) 



(32) 



where Lsp(x, z,s) is the Lagrangian in (1101 , In addition, the 
updates for z', and s* are equivalent to 



p 

J+i 



arg min 

z 



j+i _ 



To 



L S p(x'+\z,t«,7-<+V) 
Ax^H^ 

~P 

4+1 - Ax t+1 ) 



o r l J+i t+L _t- 
2 ^ X SP(x ,Z ,T X 



t+1 _t+l _t+l _t+l „t+l\ 



(33a) 

(33b) 
(33c) 

(33d) 



F SP (% Z, T x ,T p ) = if P (x, r x ) + F s z P (z, T p ). (27) 



Moreover, any fixed point of the sum-product GAMP algo- 
rithm is a critical point of the Lagrangian (|3TV In addition, 
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the density functions for the minimization in ((28) are given 

&*(x) = p(x|r, T r ), 6 z (z) = p(z|p, t"), (34) 

where p(x|r, t t ) and p(z\p,T p ) are given by ||8). 

Proof: See Appendix [E] ■ 
Theorem [2] shows a relation between sum-product GAMP 
and both the ISTA and ADMM methods described earlier. 
Specifically, define the variables 

u:=(x,r x ), v:=(z,r p ). 

Due to the separable structure of the objective function (l28l ). 
the optimization (f29t can be regarded as minimizing a sepa- 
rable function Fg P (u) + Fg P (v) with linear constraints ( 129b) 
between u and v. In this context, the x and z minimizations in 
( 1321 and P3bD follow the format of the ADMM minimizations 
in Algorithm [3] for certain choices of the auxiliary functions. 
On the other hand, the optimization over r x and t p com- 
ponents follow the gradient-based method in the generalized 
ISTA method in Algorithm So, the sum-product GAMP 
algorithm can be seen as a hybrid of the ISTA and ADMM 
methods for the optimization (l29l . which is equivalent to the 
variational optimization (1251 . 

Unfortunately, this hybrid ISTA-ADMM method is non- 
standard and there is no existing convergence theory on 
the algorithm. However, Theorem [2] at least shows that if 
the sum-product GAMP algorithm converges, its fixed points 
correspond to critical points of optimization (129V 

It is useful to briefly compare Theorem [2] with the varia- 
tional interpretation of standard loopy BP. It is well-known 
[32 1 that the fixed points of standard loopy BP can be 
interpreted as distributions on the factor and variable nodes 
that minimize the so-called Bethe free energy subject to certain 
local consistency constraints. In comparison, GAMP appears 
to minimize a Gaussian approximation of the KL divergence 
subject to weaker moment matching constraints between the 
distributions on the variable nodes. In this manner, the fixed- 
points of GAMP appears closer in form of those of expectation 
propagation (EP) methods that can also be interpreted as 
saddle points of a certain free energy subject to moment 
matching [33 1. However, the exact relation between EP and 
sum-product GAMP fixed points requires further study. 

Conclusions 

Although AMP methods admit precise analyses in the 
context of large random transform matrices A, their behavior 
for general matrices is less well-understood. This limitation 
is unfortunate since many transforms arising in practical 
problems such as imaging and regression are not well-modeled 
as realizations of large random matrices. To help overcome 
these limitations, this paper draws connections between AMP 
and certain variants of standard optimization methods that 
employ adaptive vector-valued step-sizes. These connections 
enable a precise characterization of the fixed-points of both 
max-sum and sum-product GAMP for the case of arbitrary 
transform matrices A. The convergence of AMP methods for 
general A is, however, still not fully understood. Simulations 
(not shown here) have indicated, for example, that under 



general choices of A, AMP may diverge. We hope that the 
connections between AMP and standard optimization methods 
provided here help to better understand, and even improve, 
AMP convergence with general matrices. 

Appendix A 
Approximate Diagonals 

Given a matrix A <E ]R mx " and positive vectors d x £ R ra 
and d z , consider the positive matrices (fl4l . We analyze these 
asymptotic behavior of these matrices under the following 
assumptions: 

Assumption 1: Consider a sequence of matrices Q x and Q 2 
of the form ( fl4l i. indexed by the dimension n satisfying: 

(a) The dimension to is a deterministic function of n with 
lirrin^oo m/n — (3 for some /3 > 0, 

(b) The positive vectors d x and d 2 are deterministic vectors 
with 

limsup Hdxlloo < oo, limsup ||d a [[oo < oo. 

n— >oo n— >oo 

(c) The components of A are independent, zero-mean with 
var( Ay ) = Sij for some deterministic matrix S such that 

lim sup max nSij < oo. 
n »J 

Theorem 3 ( H42V ): Consider a sequence of matrices Q x 
and Q 2 in Assumption [T] Then, for each n, there exists 
positive vectors let £ x and £ 2 nonlinear equations 

^ = j- + S^, ^- = d x + S T £ z , (35) 

where the inverses on the left side are componentwise. More- 
over, the vectors £ 2 and £ x are asymptotic diagonals of Q x 
and Q 2 in the following sense: For any deterministic sequence 
of positive vectors u x £ M™ and u z £ M™, such that 

limsup Huxlloo < oo, limsup Hu^oo < oo, 

n— >oc n— >oo 

the following limits hold almost surely 

1 " 

n— >oo n * — » 

3=1 
^ m 

lim — V* [it*i((Q*)« = °- 

n— >oo to — ' 

i=l 

Proof: This result is a special case of the results in [42 1. 

■ 

The results says that, for certain large random matrices A, 
£ x and £ 2 are approximate diagonals of the matrices Q x and 
Q 2 , respectively. This motivates the following definition for 
deterministic A. 

Definition 1: Consider matrices Q x and Q 2 of the form 
(TBI for some deterministic (i.e. non-random) A, d x and d 2 . 
Let S = |A| 2 be the componentwise magnitude squared of A. 
Then, the unique positive solutions £ 2 and £ x to (l35l l will be 
called the approximate diagonals of Q x and Q 2 . 
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Appendix B 
Proof of TheoremQ] 



To prove dl6bt , observe that 



arg max 

z 

(a) 



arg max 



-,t-l\T„ 



/ z (z) + ( S «) J z + -||z-Ax*||^ 



(b) , f f. (c) 4. 

= argmaxF 2 (z,p ,r p ) = z , 



where (a) follows from substituting (|2) and (TlOb into (I16bt and 
eliminating the terms that do not depend on z and (b) follows 
from the definition of p* in line [H] and F z in ( Rbk and (c) is 
the definition of z* in line [10] This proves ( 1 16bb . The update 
d!6ab is proven similarly. To prove (116ch observe that 



(a) 1 



4(.'-Ax*) 



where (a) follows from the update of s* in line[T6]in Algorithm 
[T] (recall that the division is componentwise); and (b) follows 
from the update for p* in line [8] We have thus proven 
the equivalence of the max-sum GAMP algorithm with the 
Lagrangian updates ([T6l >. 

Now consider any fixed point of the max-sum GAMP 
algorithm. A fixed point of ( 116cl i. requires that 



Ax 



(36) 



so the vector satisfies the constraint of the optimization ([TJ. 
Now, using (l36l l and the fact that z is the maxima of (I16bb . 
we have that 

d 



L(x,z, s) = 0. 



dz 



Similarly, since x is the maxima of (I16ab , we have that 



d T 

— L M s x,z, s) 
ox 



0. 



Thus, the fixed point (x, z, s) is a critical point of the La- 
grangian ([Tol l. 

Finally, consider the quadratic terms t x and r r at a fixed 
point. From the updates of the t x and r r in Algorithm Q] and 
the definition of d x in (fT5l l, we obtain 



1 1 rp 

— = d x + — = d x + S 1 t s 



Similarly, the updates of r s and t p show that 



— = 3- + r p = -T- + St x 
t. d z d z 



(37) 



(38) 



From Definition [T] t x and t s are approximate diagonals of 
Q x and Q 2 in (fl4] i. 



Appendix C 
Proof of LemmaQ] 

For any positive vector r p and density function b z (even if 
it is not separable), we have the bound 



(a) m 

wo < E^(^) 

»=i 

;iog(27revar(« i |6 Zi )) 

log(27rr Pi ) 



(6) 

< 



(c) 
< 



2 ^ 



2 ^ 



var(zi|6 x J 



H, 



;auss v 17 ^ ) 



(39) 



where in (a), b Zi is the marginal distribution on the component 
Zi of z; (b) is the Gaussian upper bound on the entropy of each 
marginal distribution, (c) uses the fact that 

v 

log(27rew) < — hlog(27rr), 
r 

for all t with equality when r = v and (d) is the definition 
of flgauss m < l24l ). When 6 Z is separable and Gaussian all the 



inequalities are equalities in (1391 . The inequality 
follows from comparing (fJT) and (|23V 

Appendix D 
Proof of Lemma|2] 

It is useful here and in subsequent proofs to introduce the 
following notation. Partition the objective function (|23l as 

Jsp(b x ,b z , r p ) = JfpOx) + Jf P (&*, Tp), (40) 

where 

J| P (6 X ) := D(& x ||e-^ 
J|p(6,,t p ) := D(b z \\e-f* 

Then, we can rewrite 



(41a) 

H sauss (b z ,T p ). (41b) 



as 



i^sp( x , Tx) ■= min J| P (6x) 

s.t. E(x|6 x ) = x, var(x|6 x ) 
^Ip(z, T p ) := min J| P (6 Z , T p ) 
s.t. E(z|& z ) = z. 



(42a) 



(42b) 



Now, fix a positive vector r p and consider the minimization 
with the additional constraints that 

x = E(x\b x ), t x =var(x|6 x ), (43a) 
z = E(z|S 2 ) (43b) 

for some vectors x, z, t x and t p . Then, using the partition 
d40l >. the minima of (l25l l under the constraints (|43T > is precisely 
the function Fsp(x, z, r x , T p ) in ( |27] |. Thus, the minimization 
( f25b can be achieved by minimizing Fg P (x, z, t x , t p ) under 
the constraints that z = Ax and t p — St x . This minimization 
is precisely the optimization ( |29l and this fact establishes the 
equivalence between the two optimizations. 



Appendix E 
Proof of Theorem[2] 

Our proof will now follow three parts: First, we will provide 
an explicit characterization for optimization problems of the 
form ( |28l >. Next, we will use this characterization to prove that 
the sum-product GAMP updates are equivalent to the ADMM- 
ISTA iterations in (f32t and (l33l l. Finally, we will show that 
the fixed points of the iterations correspond to critical points 
of the Lagrangian d3TV 

A. KL Minimization with Moment Constraints 

We begin with a standard result on the minimization of the 
KL divergence subject to moment constraints. 

Lemma 3: Let f(u) be any real-valued measurable function 
on a real variable u. Given a mean value u and variance 
t u > 0, then the following are equivalent statements about 
a probability density function b(u): 

(a) The probability density function b(u) is the solution to 
the constrained optimization: 



b — argminZ3(6||e ■**) 

s.t. E(u|6) = u, var(w|6) 



(44) 



(b) There exists q and r q > such that the density function 
is the solution to the unconstrained optimization 



arg mm 

b 



D(b\\e 



ZT q 



q) 2 \b) 



(45) 



and ¥,(u\b) = u, var(w|6) = t u . 
(c) There exists q and r q > such that the density function 
is of the form 



b oc exp 



-f( u )-^—( u -Q) 
Zt„ 



(46) 



and E(u|6) = u, var(w|6) = r u . 

Proof: This result is standard - similar calculations are in 
|34|. The equivalence between (a) and (b) can be shown via a 
Lagrangian argument and the equivalence between (b) and (c) 
can be found by taking the derivatives of the unconstrained 
objective (|45T > with respect to b(u) for each u. ■ 

B. Equivalence of GAMP and ADMM-ISTA Iterations 

We now use Lemma [3] to that the sum-product GAMP 
iterations are equivalent to the ADMM-ISTA iterations in 
Theorem |2] We begin by proving (133bl >. Let z* equal the right- 
hand side of d33bb . We want to show that z* = z*, where z* 
is the output of line Qj] of the sum-product GAMP algorithm. 
To show this, we first observe that 



~t (a) 

z = arg mm 



|z - Ax* 



arg mm 



jIF-p- 



— * ||2 



(47) 



where (a) follows from substituting ( 142b| >, (l27j and (|3H into 
d33b| ) and eliminating the terms that do not depend on z and 
(b) follows from substituting in the definition of p* in line ([8]). 



Now, using the definition of F£ p in ( |42b| i, it follows from (|47] > 
that 

z* =E[z|6J, 



where b z is the density function on z that minimizes 

1 „„, „ , ,„2 - 



b z = argmin 



J§Abz,T;) + -\m(z\b z )-p% t 

L Z p 

Now, this minimization can be simplified as 



(48) 



(49) 



b z = argmin 



£>(6,||e-'«) + 



i=l 



i 



+ -||E(z|6 2 )-p £ || xt 



= arg mm 



[^1 



-f*i 



t\2 



(50) 



where (a) follows from substituting ( 14 lb) and (l24l into ( |49l 
and removing terms that do not depend on b z ; and (b) follows 
from the separability assumption (l22t and the fact that, for 
any density function b Zi , 

var(^|6 z J + \E( Zi \b Zi ) ~ = E (\z t - P \\ 2 \b Zt ) . 
The minimization (|49l is separable with solution 



(51) 



where the marginal density functions are the solutions to 



b Zi (zi) = argmin D(b Zi \\e h ^ 
b.. 



t\2 



From Lemma |3] the solution to the minimization 
by 



b Zi (zi) oc exp 



(52) 
is given 

(53) 



Comparing (IBTT l and ( 1521 with (O we see that 

6 z (z)=p(z|p t ,r*). (54) 

Hence, the expectation in d48l is precisely 

z^ECzIpSt*), 

which agrees with the definition of z* in line [T3] of Algorithm 
Q] Therefore, z* = z* and we have proven (I33bb . 

The proof of ( |32l is similar and also shows that x* +1 and 
t* +1 are the mean and variance of 

6 K (x)=p(x|r t ,T r i ). (55) 

The proof of d33ct is identical to the proof of ( 116c) . 
Finally to prove ( 133dt , we take the derivatives 

® T ICit „* _* _* „t\ 
- — -L S p(x ,z ,T x ,T p ,S ) 

OTp 



(6) 

(J 



9r, 

^ 2T ^ t 



a " gauss 

o- 



•p 



1 



' ^ (e) 1 t 



9 



where (a) follows from substituting in (l27j i and (l3~TT l and 
removing the terms that do not depend on r p ; (b) follows from 
the definition of F| p in d42bk (c) follows from the definition 
of J| P in (14 1 bl > ; (d) can be verified by simply taking the 
derivative of -ff gauss with respect to each component r Pi , and 
(e) follows from the definition of rj in line [17] of Algorithm 
Q] This proves ( 133d| >, and we have established that the sum- 
product GAMP updates are equivalent to d32l and ( 13 31 , 



C. Characterization of the Fixed Points 

We conclude by showing that the fixed points of the sum- 
product GAMP algorithm are critical points of the Lagrangian 
(|3TT l. To account for the constraint that r v = St x , define the 
modified Lagrangian, 

Lsp-mod(x,z, t x ,s) = L sp (%z,t x ,St x ,s), (56) 

which is the Lagrangian Lsp(x, z, t x ,t p , s), with t p = Sr x . 

Now consider any fixed point of the sum-product GAMP 
updates. Since they are fixed points, we will drop the depen- 
dence on t, and write the fixed points as x, z, s, etc. To show 
that the fixed points are critical points of the optimization d29l ), 
we will show that the vectors are critical points of the modified 
Lagrangian isp-mod an d satisfy the constraint that z = Ax. 

Now, the fixed points of the sum-product GAMP algorithm 
must be fixed points of the Lagrangian updates (l33l and d3"2l . 
In particular, a fixed point of ( |33cK requires that 

z = Ax. (57) 

Also, from d33at , we have that 

t p = St Xi (58) 

so the vectors satisfies the constraints (129bl i in the optimiza- 
tion. 

Now, using d57b and the fact that z is the minima of d33bl ), 
we have that 

9 

— X SP (x,z, T x ,T p ,s) = 0. (59) 
Due to ( f58l ). the derivative d59l implies that 

„ mod (%z,T x ,s) = 0. (60) 



9 T 



Similarly, since x is the minima of d32l >. we have that 

d 

■^=£sP-mod(x,Z,T x ,s) = 0. 

ax 

The minimization ( 1321 also implies that 



Therefore, 



_d_ 

d 
dr x 

(a) 



L SP (x,z,T x ,T p s) = 

LsP-mod(x,Z,T X! s) 
d 



(61) 



(62) 



dn 



-L 3 p(x,Z,T x ,T p ,s) 



(6) 



1 



d 



^Sp(x,Z,T x ,T p ,s) 



where (a) follows from the the definition Lsp-mod m d56] >; (b) 
follows from (|33db and d62l . The derivatives (ISTl l, (|60l and 
(|6T]> along with the constraints (|57| i and d58l show that the 
vectors x, z, r x and r v are critical points of the optimization 
d29i i. Finally, using Lemma [3] and similar arguments as the 
derivation of (l54l and (l55T > show that the density functions are 
b x and b z in the minimizations in d28l are given by (l34l . 



(63) 
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